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1.  Introduction 


Urban  warfare  is  one  of  the  deadliest  activities  our  infantry  Soldiers  must  undertake.  Building 
clearing  and  street  fighting  in  built-up  areas  expose  our  troops  to  enemy  combatants  at  close 
range  and  with  little  opportunity  to  assess  the  situation  before  the  encounter.  In  this 
environment,  Soldiers  routinely  use  remotely  controlled  or  teleoperated  robots  to  extend 
situational  awareness  from  nearby  defensive  positions.  This  reduces,  but  does  not  eliminate,  the 
risk  to  the  Soldier.  Fully  autonomous  robots  could  be  sent  ahead  of  troops  to  explore 
environments  with  minimal  supervision.  Mapping  the  environment,  particularly  the  architectural 
layout  and  the  presence  of  people,  is  important  information  to  report  to  Soldiers  before  they 
commit  to  an  area. 

The  robot  is  a  computer-based  machine  and  must  process  data  from  its  sensors  to  extract 
information  useful  to  itself  and  the  Soldier.  In  the  field  of  robotic  perception,  the  core  challenge 
is  to  extract  meaningful  information  about  an  environment  based  on  raw  sensor  data.  Extracting 
such  information  is  difficult,  so  the  perception  scientist  must  attempt  to  simplify  the  general  case 
to  achieve  desired  results.  A  robot  operating  in  an  environment  built  by  and  for  people  can 
benefit  from  some  assumptions  about  its  environment.  One  of  the  most  useful  assumptions  for 
an  indoor  or  urban  setting  is  that  the  environment  is  dominated  by  planar  structures  (walls,  floor, 
ceiling),  which  are  frequently  orthogonal.  Furthermore,  the  location  of  these  architectural 
features  is  very  useful  information  for  situational  awareness.  Fortunately,  sensors  capable  of 
responding  to  these  structures  have  recently  become  available  and  are  relatively  inexpensive. 

The  focus  of  this  effort  is  to  extract  planar  objects,  such  as  walls,  from  the  raw  data  to  assist  in 
navigating  an  unknown  environment  and  to  incrementally  build  maps  to  assist  in  planning  the 
exploration  of  the  environment.  Section  2  will  provide  a  background  of  similar  work  with  other 
sensors,  a  description  of  the  sensor  used  for  these  experiments,  and  a  brief  background  into  other 
research  involving  plane  extraction.  The  remainder  of  the  report  will  focus  on  the  constituents  of 
the  process  of  extracting,  identifying,  and  locating  walls  in  the  data  from  such  a  sensor.  Section 
3  will  discuss  the  current  state  of  the  research,  including  the  algorithms  used  to  extract  and  label 
planes,  as  well  as  estimating  parameters  using  visualization  tools.  Section  4  will  present  the 
results  of  the  plane-finding  algorithm,  in  both  complex  and  simple  static  scenes.  Preliminary 
results  are  presented  for  a  dynamic  scene,  but  efforts  to  date  have  been  dedicated  to  building  and 
developing  the  tools  rather  than  rigorously  evaluating  performance  in  dynamic  situations. 

Finally,  section  5  will  discuss  the  conclusions  and  options  to  proceed  with  further  research. 
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2.  Background 


Two  essential  characteristics  of  a  wall  are  that  it  is  planar  and  vertical.  In  the  context  of  the 
interior  of  a  building,  a  wall  may  be  partially  occluded,  is  typically  adjacent  to  other  walls,  and 
may  be  penetrated  by  doors  or  windows.  Our  robot  should  not  only  identify  that  a  wall  is 
nearby,  but  to  map  the  location  of  the  wall.  As  our  robot  is  mobile,  we  need  frequent  sensor 
updates  to  add  information  from  the  new  sensor  viewpoint  and  to  confirm  information  from 
previous  viewpoints. 

2,1  Useful  Sensors 

While  a  simple  camera  can  provide  this  information  as  long  as  the  robot  is  moving,  location  is 
computationally  intensive  to  extract  from  the  essential  two-dimensional  (2-D)  nature  of  camera 
data.  Stereoscopic  cameras  help  somewhat,  but  technologies  that  intrinsically  deliver  location 
are  available.  If  the  assumptions  of  a  planar,  vertical  operating  environment  reliably  hold,  a 
single-line  horizontal-scanning  laser  range  finder  is  useful  and  inexpensive.  A  device  such  as  the 
SICK  LMS-200  or  the  Hokuyo  URG-04LX,  also  called  a  “line  scanner,”  senses  a  wall  as  a 
straight  line  in  data  converted  to  Cartesian  coordinates.  These  sensors  sweep  a  laser  range  finder 
through  a  360°  rotation,  reporting  the  range  to  the  scene  in  a  linear  frame  at  an  angular  resolution 
of  fractions  of  a  degree  at  a  rate  of  up  to  30  Hz.  There  is  an  entire  body  of  literature  devoted  to 
applying  this  sort  of  sensor  to  navigate  through  hallways  and  exploring  rooms  (Durrant- Whyte 
and  Bailey,  2006;  Gutmann  and  Schlegel,  1996). 

The  reliability  of  detecting  walls  is  degraded,  however,  in  environments  where  the  floor  is  not 
smooth.  This  is  because  sensor  roll  and  pitch  convert  the  wall’s  straight  line  data  plot  to  a  curve, 
confusing  the  tracking  of  features  from  one  frame  to  another.  Another  problem  commonly 
occurs  when  the  elevation  of  the  sensor  causes  it  to  cast  its  single  sensing  beam  on  a  cluttered 
region  where  occlusions  dominate  (e.g.,  the  forest  of  chair  legs  under  a  conference  table),  even 
though  planes  dominate  at  another  elevation  (e.g.,  above  the  conference  table). 

To  overcome  these  limitations,  it  is  common  to  mount  the  line  scanner  on  a  tilt  actuator, 
sometimes  termed  a  “nodding”  scanner  (Harrison  and  Newman,  2008).  Data  is  then  collected 
from  a  sequence  of  planes  pivoted  at  the  actuator  tilt  axis.  The  degree  of  tilt  is  measured  and 
factored  into  the  location  computation  of  the  three-dimensional  (3-D)  point  corresponding  to  a 
range  measurement.  This  technique  retains  much  of  the  accuracy  of  the  line  scanner  and  can 
generate  a  very  dense  point  cloud.  The  data  frame  is  a  composite  of  a  number  of  line  scanner 
scans,  so  the  frame  rate  is  low,  potentially  tens  of  seconds  per  frame.  Consequently,  if  a  frame  is 
collected  from  a  moving  robot,  the  robot  motion  must  also  be  factored  into  the  computation  of  a 
3-D  point.  Since  robot  motion  is  generally  known  only  approximately,  the  accuracy  of  the 
generated  point  cloud  suffers. 
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A  sensor  that  has  recently  become  available  is  the  3-D  time-of- flight  (TOP)  camera,  now 
available  from  at  least  two  makers  (Mesa  Imaging  AG  and  PMDTechnologies  GmbH).  Such  a 
camera  integrates  a  modulated  light  source  synchronized  with  a  complementary  metal-oxide 
semiconductor  (CMOS)  imager  made  of  “intelligent”  pixels.  Each  pixel  calculates  the  phase 
difference  between  the  light  incident  on  it  and  the  illumination  modulation.  The  difference  is 
converted  to  a  time  of  flight,  which  in  turn  is  converted  to  a  distance.  The  measurements  of  all 
the  pixels,  organized  as  a  2-D  array,  are  termed  the  “range  image.”  The  camera  reports  the 
intensity  image,  the  range  image,  and  an  image-structured  cloud  of  Cartesian  points  with  origin 
at  the  camera,  collectively  termed  a  “frame.”  (An  element  of  the  frame  associated  with  a  single 
“intelligent  pixel”  will  hereafter  be  called  a  “pixel.”)  Frames  are  available  at  rates  up  to  30  Hz. 
The  expanded  vertical  field  of  view  (FOV)  of  this  sensor,  typically  30°,  offers  some  ability  to 
detect  and  measure  sensor  pitch  and  roll  effects  in  observations  of  the  environment.  It  also 
enables  observations  across  a  span  of  elevations,  potentially  including  both  the  chair  leg  forest 
and  the  walls  above  the  table,  for  example.  A  final  useful  attribute  of  this  sort  of  sensor  is  that 
frames  of  data  are  acquired  virtually  instantly,  so  low-frequency  sensor  motion  has  little  effect 
on  the  data  collected. 


2,2  SwissRanger 

The  sensor  selected  for  the  study  is  a  3-D  TOF  camera,  model  SR-3000,  from  Mesa  Imaging 
AG,  of  Zurich,  Switzerland.  This  sensor  is  commonly  called  a  SwissRanger.*  Specifications 
are  shown  in  table  1 .  The  sensor  is  small  and  requires  little  power,  making  it  well  suited  to  a 
mobile  research  robot  of  modest  dimension.  The  pixel  count  and  FOV  together  provide  a 
generous  resolution  at  the  ranges  of  which  the  sensor  is  capable. 


Table  1.  Excerpts  from  SwissRanger  SR-3000  spec  sheet. 


Pixel  Array  Size 

176  X  144 

FOV 

47.5°  X  39.6° 

Output  Data  (Per  Pixel) 

Range;  x,  y,  z  coordinates;  i  (intensity) 

Operating  Range  (m) 

0.3 

1 

2 

X-Y  Resolution  (1  Pixel)  (mm) 

1.5 

5.0 

10.0 

Distance  Resolution  (mm) 

2.5 

6 

13 

Nonambiguity  Range 

7.5  m  (for  20-MHz  modulation) 

Illumination  Power  (Optical) 

1  W  (average  power)  at  850  nm 

Adapted  from  the  Mesa  Imaging  AG  SR-3000  Data  Sheet,  rev.  January  2008;  www.mesa-imaging.ch/pdf 
/SR3000_Flyer_Jan07.pdf  (accessed  January  2011). 


SwissRanger  is  a  registered  trademark  of  Mesa  Imaging  AG. 
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The  SR-3000  sensor  suffers  some  limitations  not  apparent  in  the  spee  sheet.  The  spee  sheet  eites 
a  “non-ambiguous  range”  distanee  limitation  of  7.5  m,  but  in  praetiee  the  limit  is  usually  the 
intensity  of  the  modulated  light,  whieh  diminishes  with  distanee  from  the  sensor.  Our 
experiments  seldom  range  beyond  5  m.  Another  limitation  is  ambient  light.  The  spee  sheet 
eautions  that  the  sensor  is  for  “indoor  operation  only.”  Under  indoor  eonditions,  distances  to  an 
undraped  window  are  completely  unreliable.  We  have  not  yet  attempted  to  quantify  these 
observations  but  note  them  as  needing  further  characterization. 

Other  effects  have  been  widely  studied.  In  particular,  nonlinearities  in  the  range  measurements 
of  3-D  TOP  cameras  have  been  documented  (Lindner  and  Kolb,  2007;  May  et  ah,  2009),  along 
with  procedures  for  calibrating  out  the  worst  of  the  error.  These  nonlinearities  have  been 
measured  as  a  diminishing  sinusoid,  with  amplitude  as  high  as  10  cm  with  a  period  as  short  as 
1 .5  m.  The  measurements  of  interest  in  fitting  planes  are  typically  among  nearby  neighbors, 
however,  and  appear  to  be  highly  correlated  in  the  data  we  have  analyzed.  Further  effort  devoted 
to  calibration  is  likely  to  improve  the  accuracy  of  measurement,  but  the  “factory”  calibration  has 
shown  itself  to  be  sufficiently  accurate  for  our  current  application. 

2,3  Plane  Detection 

Two  groups,  whose  writings  we  have  tracked,  have  published  extensively  in  the  realm  of 
extracting  planes  from  point  clouds.  One  group  works  out  of  Jacobs  University,  Bremen,  and  is 
anchored  by  Andreas  Birk,  Kaustubh  Pathak,  Jann  Popinga,  and  Narunas  Vaskevicius.  This 
group  favors  3-D  TOP  sensors,  prefers  a  region-growing  approach  to  plane  extraction  without 
using  local  normals  (Vaskevecius  et  ah,  2007),  and  has  described  an  approach  to  plane  matching 
that  supports  tracking  for  simultaneous  localization  and  mapping  (SLAM)  applications  (Pathak 
et  ah,  2009). 

Another  group  from  Munich  Technical  University  and  Willow  Garage,  including  Radu  Bogdan 
Rusu  and  others  (Marton  et  al,  2009;  Rusu,  2009),  employ  random  sampling  consensus 
(RANSAC)-based  approaches  to  extract  planes  and  other  geometric  primitives  from  a  point 
cloud  data  set.  The  consensus  is  built  on  associations  between  a  planar  model  and  features  of  a 
given  point.  While  the  simplest  associations  are  made  using  the  3-D  location  of  a  point,  a  more 
complex  method  makes  use  of  local  point  normals  as  well. 


3.  Method 


Region-based  growing  methods  have  been  researched  previously  by  the  authors  with  the  intent 
of  developing  a  fast  and  robust  planar  region-growing  technique.  The  resulting  technique 
identifies  planar  patches  given  a  seed  point;  rather  than  the  traditional  region-growing  method  of 
identifying  all  coplanar  points  to  the  seed  point,  the  algorithm  attempts  to  quickly  find  the 
extents  of  the  coplanar  patch.  The  sacrifice  made  is  that  not  every  point  within  the  extents  is 
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tested,  and  thus  less  data  is  used  to  define  a  planar  patch.  Experiments  showed  the  method  to  be 
faster  than  traditional  region  growing,  though  the  method  suffers  the  same  limitations  as  any 
seed-based  algorithm.  That  is,  automatic  selection  of  good  seed  points  is  very  difficult  to 
achieve;  though  it  is  a  well-studied  topic,  it  has  yet  to  be  solved.  This  issue,  combined  with  the 
fact  that  the  software  platform  (ROS.org,  2011a)  supports  RANSAC -based  approaches  over 
region-growing  methods,  led  to  the  choice  to  use  a  RANSAC -based  approach  to  extract  planes. 

3.1  Planar  Extraction 

In  the  simplest  case  of  RANSAC -based  plane  finding,  models  are  built  by  fitting  the  best  plane 
to  a  random  sampling  of  points  from  a  point  cloud  data  set.  The  model  is  then  verified  according 
to  how  many  points  have  a  small  point-to-plane  distance. 

As  shown  in  figure  1,  a  plane  may  be  defined  by  ax  +  by  +  cz  +  d  =  0,  where  n  is  the  unit  normal 
vector  defined  by  (a,b,c),  and  d  is  the  normalized  distance  to  the  origin  of  the  reference  frame  (in 
our  case,  the  origin  of  the  sensor  frame).  To  find  the  distance  of  the  plane  to  a  point  P,  the  point- 
to-plane  distance  Dp  is  derived  as  follows.  The  vector  v  is  the  vector  between  the  origin  and  P, 

and  the  vector  d  is  the  vector  from  the  origin  to  the  closest  point  on  the  plane,  defined  by  d  =  dn. 
The  vector  w  is  defined  as  w  =  v  -  d.  The  point-to-plane  distance  is  the  projection  of  w  onto  the 
unit  normal  n,  given  as  Dp  =  w  •  n  =  v  •  n  —  d. 


Figure  1.  The  point-to-plane  distance  Dp  between  a  point  P  and  a  plane  is 
the  projection  of  the  vector  w  onto  the  plane’s  normal  vector  n. 
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RANSAC  uses  the  point-to-plane  threshold  to  identify  all  points  that  belong  to  the  same  plane, 
and  the  plane  with  the  largest  number  of  points  is  returned.  The  inliers  to  the  plane  are  removed 
from  the  original  point  cloud,  and  the  process  is  repeated  on  the  outliers  of  the  plane  until  at  least 
two-thirds  of  the  data  set  has  been  segmented.  This  method  requires  no  preprocessing  and  is 
therefore  fast  when  compared  with  methods  that  require  normal  vector  estimates  at  each  point. 
Figure  2  illustrates  a  shortcoming  of  the  method.  The  largest  planes  have  been  properly 
identified,  but  many  points  are  improperly  grouped.  This  is  due  to  the  fact  that  they  lie  on  the 
mathematical  plane  defined  in  the  RANSAC  algorithm,  but  they  physically  belong  to  a  distinct 
nonparallel  plane.  Depending  on  which  planar  model  is  randomly  created  first,  the  points  might 
belong  to  one  of  multiple  planes. 


Figure  2.  Simple  RANSAC  algorithm  compared  with  using  local  normals. 

Note:  Top  left:  red  and  yellow  planes  contain  points  that  should  belong  to  distinct  intersecting  planes.  Top  right:  local  normals  are 
used  to  correct  these  artifacts.  Bottom:  normalized  range  image  of  the  scene. 
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Though  possible  postprocessing  steps  could  filter  these  artifacts  out,  a  more  robust  method  is  to 
use  local  normal  estimations  at  each  point  for  the  RANSAC  method.  The  algorithm  works  as 
follows.  At  each  point  in  a  point  cloud,  principal  component  analysis  is  used  to  estimate  the 
point’s  normal  vector.  For  a  query  point  Pq  in  a  point  cloud  C,a{k  +  l)x3  matrix  A  ,  composed 
of  the  3-D  coordinates  of  Pq  and  its  k  nearest  neighbors,  is  created  as  shown  in  equation  1. 
Eigenvalue  decomposition  of  this  matrix  yields  the  three  principal  components  of  the 
neighborhood  around  Pq  (equation  2)-  Since,  for  a  pure  plane,  the  variability  is  entirely  in  two 
directions,  the  eigenvector  associated  with  the  eigenvalue  with  the  smallest  absolute  value  is 
also  the  direction  of  the  plane’s  normal  vector.  For  relatively  small  noise  in  the  sensor  compared 
to  the  planar  size,  this  association  can  be  extended  to  estimate  the  normal  direction  of  the  plane 
passing  through  the  neighborhood  of  points  (n  ~  v^).  This  normal  direction  constitutes  the 

estimated  normal  direction  for  Pq.  The  normal  vector  n  and  the  two  major  eigenvectors 
defining  the  plane  in  the  neighborhood  of  the  query  point  are  depicted  in  figure  3. 


A  = 


X  V  z 

g  X  q  q 

X,  Zj 

Pk  ^k 


(1) 


{A-AJ)v  =  Q 


(2) 


Figure  3.  Principal  component  analysis  used  to  estimate  local  normals. 

Note:  At  left,  the  neighborhood  contains  coplanar  points  and  a  good  local  normal  n  is  estimated.  At  right,  the 
presence  of  a  neighbor  point  with  a  large  offset  affects  normal  estimates. 


7 


This  local  normal  provides  an  extra  feature  to  be  used  for  planar  identifieation.  Now,  not  only 
must  a  point  be  within  a  certain  projected  distance  of  a  planar  model,  but  its  local  normal  must 
also  be  approximately  parallel  to  the  model  normal.  This  method  is  shown  to  yield  improved 
results  (also  shown  in  figure  2),  though  the  loeal  normal  estimation  does  increase  the  eomplexity 
of  the  algorithm. 

Both  the  projected  distance  threshold  and  the  normal  angle  distance  threshold  are  input  to  an 
ROS  custom  method,  derived  from  the  SACSegmentationFromNormals  class  (ROS.org,  2011b), 
for  performing  local  normal-based  RANSAC.  The  current  default  values  of  these  parameters  are 
7.5  em  and  0.05,  respeetively.  These  values  were  determined  by  qualitative  evaluation  using 
visualization  tools. 

3.2  Visualization 

The  loeal  normal  information  is  useful  for  both  plane  identification  and  visualization  purposes. 
Visualization  is  useful  for,  among  other  things,  approximating  a  value  for  the  number  of 
neighbors  k  to  use  for  loeal  normal  ealeulation.  Local  normal  vectors  are  assoeiated  with  each 
point  in  a  point  cloud,  therefore  unit  vectors  can  be  added  to  points  directly.  A  more  efficient 
alternative  is  to  form  an  Extended  Gaussian  Image  (EGI)  (Horn,  1984). 

The  EGI,  for  our  purposes,  is  simply  a  unit  sphere  that  illustrates  the  distribution  of  local  normal 
directions  in  a  point  cloud.  The  set  of  all  loeal  normal  vectors  in  a  point  cloud  is  oriented  such 
that  the  tails  of  the  unit  vectors  are  all  conneeted  at  a  point.  This  point  will  be  the  center  of  a  unit 
sphere,  with  the  unit  normals  interseeting  the  unit  sphere  at  a  point  on  the  sphere.  This  point 
provides  an  intuitive  visualization  of  the  direetion  of  a  normal,  and  this  representation  over  the 
entire  point  eloud  is  the  EGI. 

Examples  of  the  usefulness  of  the  EGI  are  shown  in  figure  4.  The  same  algorithm  is  run  with 
two  different  values  of  k.  In  a  planar  region,  the  EGI  should  have  tight  elusters  of  points.  Eor 
the  given  planar  region,  k  is  shown  to  have  a  strong  impaet  on  the  EGI.  Since  the  accurate  unit 
normal  estimation  is  a  requirement  for  RANSAC,  successful  plane  finding  depends  on  a  well- 
chosen  k  parameter. 

3.3  Labeling  and  Persisting  Planes 

We  eurrently  identify  the  primary  planes  in  a  given  frame  of  point  cloud  data,  then  attempt  to 
assoeiate  planes  in  the  current  frame  with  planes  from  past  frames.  Assoeiations  between  two 
planes  are  made  in  an  analogous  way  to  the  RANSAC  method  already  described.  The  two 
parameters  required  for  normal-based  RANSAC  are  the  point- to-plane  distance  threshold  and  the 
normal  angle  differenee  threshold.  The  same  eriteria  are  used  to  combine  plane  i  from  the  list  of 
current  planes  in  the  eurrent  frame,  returned  by  the  RANSAC  algorithm,  with  plane  j  from  the 
list  of  previously  identified  planes.  Sinee  the  algorithm  depends  on  point-to-plane  distances,  the 
centroid  of  plane  i  is  projected  onto  plane  j.  Eurther,  the  plane  normal  is  compared  with  those 
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Figure  4.  Extended  Gaussian  images  of  the  same  scene,  with  different 

thresholds  set  for  number  of  nearest  neighbors  (k)  to  use  for  local 
normal  estimation. 

Note:  At  left,  k  =  20  yields  poor  results,  with  sensor  measurement  noise  leading  to 

noisy  normal  estimates.  At  right,  k  =  80  mitigates  the  effects  of  sensor  errors  by 
increasing  the  neighborhood  sample  size. 


identified  in  previous  frames.  If  no  mateh  is  found,  it  is  added  to  a  dynamieally  growing  list  of 
planes.  The  planar  definitions,  shown  in  table  2,  are  much  more  compact  than  the  point  data  they 
represent.  The  list  of  planes  is  broadcast  within  the  ROS  environment  to  any  interested  node, 
which  may  wish  to  use  the  planar  definitions  for  higher-level  processing.  If  a  plane  were  defined 
in  a  previous  frame  but  is  not  currently  visible,  it  still  remains  in  the  plane  list  in  case  it  is  seen 
again  in  the  future. 

Table  2.  Current  definition  of  a  plane. 


Planar  Data  Structure 

Field  Type 

Field  Name 

Field  Description 

bool 

is  current 

Distinguishes  between  planes  that  exist  in  the  current  frame  and  those  that 
were  seen  previously 

int 

num pts 

Number  of  points  represented  by  the  plane 

int 

mse 

Mean-squared  error  of  points  (error  is  measured  as  point-to-plane  distances) 

float* 

coef 

Coefficients  (a,b,c,d)  that  define  the  plane 

float* 

centr 

Centroid  of  the  set  of  points  belonging  to  the  plane 

float* 

conv  hull 

Convex  hull  of  the  set  of  points  belonging  to  the  plane  (for  future  work) 

uint 

decay 

Function  of  time  from  when  plane  was  last  seen  (for  future  work) 

Given  proper  associations  between  current  planes  and  those  of  prior  frames,  a  logical  progression 
would  be  to  attempt  to  find  the  best  rotation  and  translation  between  the  current  and  previous 
planes,  thus  providing  an  ego-motion  estimate.  This  is  a  problem  currently  being  researched  by 
Pathak  et  al.  (2010),  although  their  plane  extraction  methods  differ  from  ours.  There  are  other 
methods  currently  applied  to  sets  of  points  that  might  be  extended  to  sets  of  planes,  such  as  the 
Iterative  Closest  Point  (ICP)  algorithm.  Investigation  into  these  topics  is  left  as  future  work. 
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4.  Experimentation 


Experiments  were  performed  to  identify  the  sources  of  uncertainty  of  the  planar  estimation.  For 
the  first  experiment,  the  uncertainty  induced  by  sensor  noise  is  quantified.  For  the  remainder  of 
the  experiments,  we  attempted  to  isolate  uncertainty  induced  by  RANSAC  alone  vs.  the 
uncertainty  of  the  complete  process,  which  incorporates  uncertainty  from  both  the  sensor  data 
and  the  RANSAC  algorithm. 

To  isolate  the  effects  of  RANSAC,  we  extract  a  single  frame  from  the  SwissRanger  and 
repeatedly  run  the  RANSAC  algorithm  on  it  as  if  it  were  a  live  data  stream,  thus  eliminating 
sensor  noise  as  a  source  of  uncertainty.  The  variability  of  planar  estimates  therefore  results 
purely  from  the  random  nature  of  the  RANSAC  algorithm.  Then,  the  full  algorithm  is  run  with 
live  sensor  data,  and  the  results  are  analyzed. 

Statistical  measures  of  variability  for  each  plane  are  described  as  follows.  Results  using  these 
descriptions  are  reported  in  tables  3-6. 

•  Number  of  points:  The  number  of  points  reported  as  belonging  to  the  plane,  both  mean  and 
standard  deviation  (SD),  nptsjnean  and  npts_std.  The  number  of  times  the  plane  was 
represented  in  the  results  is  also  reported  as  percentjd,  with  a  value  of  1.0  representing 
100%. 

•  Plane  angle:  The  planar  normal,  represented  in  sensor  coordinates.  We  report  the  mean 
unit  vector  normaljneans  as  point  coordinates  in  the  sensor  coordinate  frame  and  circular 
standard  deviation  normal_stds  of  plane  normals  for  the  plane  over  all  frames. 

•  Centroid:  The  mean  centrjnean  and  SD  centr_stdXYZ  of  the  centroid  of  the  points 
belonging  to  the  plane.  These  statistics  are  reported  as  point  coordinates  in  the  sensor 
frame. 

•  Centroid  projection:  The  non-negative  point-to-plane  distance  of  the  plane  centroid  onto 
the  plane  defined  by  the  normaljneans  vector.  The  mean  centr  jjrojjnean  and  SD 
centr _proj_std  over  all  frames  are  reported. 

•  Within-plane  point  error:  The  root  mean  square  error  (RMSE)  measured  along  the  plane 
normal  from  a  point  to  its  associated  plane  for  an  individual  plane  within  a  frame.  We 
report  the  mean  RMSjnean  of  the  RMSE  over  all  frames  and  the  corresponding  SD 
RMS_std. 

•  Distance  to  corrected  plane:  The  distance  from  the  sensor  center  to  a  plane  parallel  to  the 
mean  plane  (normal  means)  through  the  frame  centroid.  We  report  the  mean  distance 
plane  jiistjnean  and  SD  plane jiistjtd. 
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4,1  Sensor  Noise  Estimation 


The  objective  of  experiment  1  was  to  determine  the  variability  of  the  sensor  itself  The  measure 
of  variability  is  the  SD  across  frames  of  the  distance  to  a  static  planar  target.  The  distance 
selected  for  the  measure  was  not  the  measured  range,  but  the  component  of  the  range  along  the 
axis  of  the  camera  (the  “z”  component  of  the  Cartesian  point  in  sensor  coordinates,  reported  by 
the  3-D  camera).  Use  of  the  Cartesian  z  component  allows  easy  comparison  of  a  pixel  to  its 
neighbors  in  the  target  frame  and  is  consistent  with  the  intended  use  as  a  sensor  of  planes. 

As  shown  in  figure  5,  the  SD  measure  varies  in  a  nonlinear  fashion  with  distance  from  the  center 
pixel.  At  distances  between  0.5  and  5  m,  pixels  with  SD  >5  cm  occur  only  at  the  comers  of  the 
image.  The  majority  of  pixels  have  SD  between  0.5  and  2  cm.  This  value  is  near  the  resolution 
of  the  sensor  claimed  in  the  SwissRanger  spec  sheet  and  is  consistent  with  the  value  reported  in 
Kahlmann  et  al.  (2007). 


Figure  5.  SDs  for  different  distanees  of  an  SR-3000  and  a  vertieal  plane. 

Note:  For  the  left  and  right  plots,  the  distances  to  the  wall  are  1 .98  and  2.31  m,  respectively.  Tests  showed  that  for  under  5  m  to 
the  wall,  only  the  comers  of  the  image  had  SDs  larger  than  5  cm. 

4,2  Simple  Scene 

In  experiments  2  and  3,  sensor  data  from  a  simple  scene  containing  three  strong  planes  (shown  in 
figure  2)  was  processed  by  our  method  of  plane  extraction.  In  experiment  2,  planes  were 
extracted  from  a  single  frame  of  the  simple  scene.  This  was  repeated  30  times  using  the  same 
sensor  data.  Since  the  sensor  data  is  exactly  the  same  in  each  iteration,  the  variability  in  results 
must  be  a  consequence  of  the  RANSAC  method  of  plane  extraction.  The  results  from 
experiment  2  are  shown  in  table  3. 
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Table  3.  Results  from  experiment  2;  the  evaluation  of  RANSAC  consistency  for  a  single  frame  of  the  simple 
scene  shown  in  figure  2. 


Measures  of  Variability 

Planel 

Plane_2 

Plane_3 

nptsmean 

8.5877e+03 

1.9456e+03 

1.1906e+03 

nptsstd 

14.5150 

120.3187 

101.0541 

percentid 

1 

1 

0.6667 

normal  means  (m) 

0.4854 

-0.7549 

-0.8054 

7.1067e-05 

0.0431 

0.0380 

-0.8743 

-0.6544 

-0.5928 

normal  stds  (deg) 

0.7014 

4.0932 

4.9233 

centr  mean  (m) 

-0.8943 

1.1677 

2.1528 

0.1480 

0.1796 

0.9314 

3.3595 

5.3273 

5.9246 

centr  stdXYZ  (m) 

0.0016 

0.0347 

0.0132 

8.1630e-04 

0.0450 

0.0699 

8.8596e-04 

0.0373 

0.0194 

centr  proj  mean  (m) 

6.2594e-05 

0.0071 

0.0046 

center  proj  std  (m) 

5.9724e-05 

0.0052 

0.0037 

RMS  mean  (m) 

0.0181 

0.0360 

0.0345 

RMS_std  (m) 

0.0053 

0.0046 

0.0089 

plane  dist  mean  (m) 

3.3713 

4.3600 

5.2085 

plane  dist  std  (m) 

8.7293e-05 

0.0089 

0.0060 

In  experiment  3,  planes  were  extraeted  from  a  stream  of  10  frames  of  the  simple  statie  scene.  In 
addition  to  RANSAC  variability,  the  variability  of  the  sensor  affects  results.  However,  the 
simplicity  of  the  scene  makes  matching  errors  unlikely.  The  results  of  experiment  3  are  shown 
in  table  4. 
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Table  4.  Results  from  experiment  3;  a  stream  of  data  from  the  simple  static  scene  shown  in  figure  2. 


Measures  of  Variability 

Planel 

Plane_2 

Plane_3 

nptsmean 

8.5732e+03 

1.9339e+03 

1.2723e+03 

nptsstd 

22.2351 

185.4798 

53.1890 

percentid 

1 

1 

0.6000 

normal  means  (m) 

0.4889 

-0.7433 

-0.7906 

1.7207e-04 

0.0433 

0.0555 

-0.8723 

-0.6675 

-0.6098 

normal  stds  (deg) 

1.1525 

4.4461 

3.1046 

centr  mean  (m) 

0.8976 

1.1689 

2.1455 

0.1454 

0.1500 

0.9238 

3.3636 

5.3328 

5.9169 

centr  stdXYZ  (m) 

0.0041 

0.0302 

0.0045 

0.0015 

0.0825 

0.0196 

0.0026 

0.0428 

0.0089 

centr  proj  mean  (m) 

0.0011 

0.0085 

0.0045 

center  proj  std  (m) 

0.0015 

0.0050 

0.0023 

RMS  mean  (m) 

0.0189 

0.0363 

0.0344 

RMS_std  (m) 

0.0080 

0.0076 

0.0065 

plane  dist  mean  (m) 

3.3730 

4.4222 

5.2530 

plane  dist  std  (m) 

0.0019 

0.0103 

0.0055 

4,3  Complex  Scene 

In  experiments  4  and  5,  sensor  data  from  a  eomplex  seene  eontaining  a  number  of  planes  was 
processed  by  our  plane  extraction  method.  The  complex  scene  used  for  testing  is  shown  in 
figure  6.  Figure  6  also  shows  a  top-down  orthogonal  view,  to  illustrate  the  depth  offset  of  the 
vertical  planes  and  a  range  image. 

In  experiment  4,  planes  were  extracted  from  an  archive  of  30  frames  of  the  complex  scene.  This 
experiment  presents  the  algorithm  with  a  stronger  challenge  in  matching,  but  because  the  data  is 
identical  from  frame  to  frame,  the  only  error  can  come  from  the  random  nature  of  the  RANSAC 
algorithm.  The  results  of  experiment  4  are  shown  in  table  5. 

In  experiment  5,  planes  were  extracted  from  a  stream  of  10  frames  of  a  complex  scene.  As  in 
experiment  3,  errors  from  both  RANSAC  and  the  sensor  affect  results.  In  this  experiment,  a  new 
measure  of  variability  is  added:  the  number  of  new  (different  from  any  encountered  in  the 
previous  frame)  planes  encountered  in  a  frame.  The  scene  is  static,  so  the  number  of  planes 
should  remain  constant  after  initialization.  If  a  plane  is  found  for  which  there  is  no  existing 
match,  it  is  recorded  as  a  new  but  different  plane.  In  a  static  scene  such  as  this  one,  this  indicates 
a  matching  error.  A  frame-to-frame  plot  of  this  measure,  as  well  as  the  total  number  of  planes 
discovered  in  experiment  5,  is  presented  in  the  fourth  quadrant  of  figure  6.  Other  results  of 
experiment  5  are  shown  in  table  6. 
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Figure  6.  Results  from  a  eomplex  seene. 

Note:  Top  left:  predominant  planes  are  displayed  along  with  extended  Gaussian  image.  Top  right:  same  data  from  top-down  view. 
Bottom  left:  range  image  corresponding  to  this  frame.  Bottom  right:  the  number  of  new  planes  entering  the  scene  is  tracked, 
along  with  the  number  of  current  planes  in  a  given  frame.  The  step  in  both  graphs  at  frame  3  demonstrates  that  a  new  plane  was 
identified  in  frame  3  but  then  disappeared  in  frame  4.  This  is  the  result  of  a  tracking  error  in  experiment  5. 
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Table  5.  Results  from  experiment  4;  the  evaluation  of  RANSAC  eonsisteney  for  a  single  frame  of  the 
eomplex  seene  shown  in  figure  6. 


Measures  of  Variability 

Planel 

Plane_2 

Plane_3 

nptsmean 

5.2804e+03 

4.7081e+03 

2.6738e+03 

nptsstd 

74.1241 

82.7101 

213.8191 

percentid 

1 

1 

1 

normal  means  (m) 

-0.5936 

-0.8128 

-0.6961 

0.0415 

0.0155 

0.0596 

-0.8037 

-0.5824 

-0.7155 

normal  stds  (deg) 

1.1573 

1.5767 

2.6868 

centr  mean  (m) 

0.4010 

-0.5047 

-0.1621 

0.9685 

0.0878 

-0.3424 

4.1867 

3.2650 

3.6768 

centr  stdXYZ  (m) 

0.0331 

0.0333 

0.0755 

0.0197 

0.0062 

0.0308 

0.0269 

0.0538 

0.0770 

centr  proj  mean  (m) 

0.0015 

0.0033 

0.0043 

center  proj  std  (m) 

0.0014 

0.0032 

0.0054 

RMS  mean  (m) 

0.0289 

0.0265 

0.0348 

RMS_std  (m) 

0.0065 

0.0130 

0.0420 

plane  dist  mean  (m) 

3.5625 

1.4900 

2.5382 

plane  dist  std  (m) 

0.0021 

0.0046 

0.0069 

Table  6.  Results  of  experiment  5  from  a  stream  of  data  from  the  eomplex  seene  shown  in  figure  6. 


Measures  of  Variability 

Plauel 

Plaue_2 

Plaue_3 

uptsmeau 

5.3143e+03 

4.9971e+03 

2.8782e+03 

uptsstd 

88.8495 

173.4964 

188.5623 

perceutid 

1 

1 

1 

uormal  meaus  (m) 

-0.5826 

-0.8005 

-0.6846 

0.0485 

0.0160 

0.0417 

-0.8113 

-0.5991 

-0.7277 

uormal  stds  (deg) 

1.2900 

2.2566 

1.8974 

ceutr  meau  (m) 

0.4051 

-0.5689 

-0.2208 

0.9210 

0.0597 

-0.3495 

3.9899 

3.1275 

3.5251 

ceutr  stdXYZ  (m) 

0.0274 

0.0320 

0.0623 

0.0156 

0.0162 

0.0107 

0.1237 

0.1174 

0.1020 

ceutr  proj  meau  (m) 

0.0742 

0.0611 

0.0700 

ceuter  proj  std  (m) 

0.0476 

0.0408 

0.0449 

RMS  meau  (m) 

0.0279 

0.0273 

0.0290 

RMS_std  (m) 

0.0080 

0.0134 

0.0079 

plaue  dist  meau  (m) 

3.4284 

1.4174 

2.4288 

plaue  dist  std  (m) 

0.0915 

0.0762 

0.0863 
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4.4  Dynamic  Scene 

Finally,  in  experiment  6,  planes  were  extraeted  from  a  stream  of  10  frames  of  a  eomplex  seene 
(the  interior  of  an  offiee  divided  into  cubicles)  in  which  the  sensor  moves  slowly,  with  both 
rotational  and  translational  components.  The  sensor  has  no  component  reporting  its  pose,  and  all 
sensor  measurements  are  in  the  sensor  reference  frame,  so  when  the  sensor  moves,  it  perceives 
that  the  scene  and  all  its  components  have  moved.  This  experiment  challenges  the  robustness  of 
the  tracking  algorithm,  since  the  tracking  uses  the  prior  definition  of  the  plane  with  no  a  priori 
motion  estimation  to  associate  planes. 

Figure  7  shows  two  consecutive  frames  from  experiment  6  that  differ  visibly  as  a  consequence  of 
sensor  motion.  Surfaces  that  change  color  from  one  frame  to  the  next  indicate  that  our  primitive 
tracking  algorithm  was  unable  to  associate  a  plane  from  the  earlier  frame  to  a  plane  in  its 
successor.  The  “new  plane”  plot  also  shows  evidence  of  difficulties  in  matching  planes  frame- 
to-frame.  While  small  values  of  the  ’’new  plane”  plot  may  indicate  that  a  planar  object  has 
entered  or  exited  the  FOV  or  that  a  matching  error  has  occurred,  large  values  of  the  number  of 
new  planes  probably  indicate  that  sensor  motion  between  frames  was  too  rapid  to  be 
accommodated  by  the  algorithm.  One  can  see  from  the  two  colored  frames  shown  that  the  far 
wall  is  consistently  tracked  while  the  side  wall  and  ceiling  are  mismatched. 

4.5  Discussion 

The  degree  to  which  two  planes  agree  may  be  determined  from  a  comparison  of  their  normal 
vectors  and  distances  to  the  origin.  Since  a  plane’s  distance  to  the  origin  depends  on  the  estimate 
of  the  normal,  which  in  turn  depends  on  the  location  of  the  points  belonging  to  the  plane,  we  use 
the  decoupled  centroid  projection  measure  to  assess  precision. 

For  the  simple  scene,  the  results  from  the  single  data  set  and  the  stream  of  data  indicate 
consistent  plane  extraction.  Differences  between  the  mean  normal  vectors  of  each  plane  in  table 
3  with  associated  planes  in  table  4  are  all  <2°,  and  differences  between  the  projected  centroids 
are  all  <3  cm,  which  is  on  the  order  of  magnitude  of  sensor  noise. 

For  the  complex  scene,  the  mean  normal  vectors  are  also  <2°  different  between  the  single  data 
set  and  the  data  stream  cases,  but  the  projected  centroid  measures  differ  significantly  between  the 
two  cases.  The  single  frame  results  are  more  consistent,  as  expected,  than  those  of  the  data 
stream.  The  SD  of  the  centroid  projection  for  the  data  stream  is  almost  5  cm  for  each  plane. 

A  few  large  outlier  points  would  normally  explain  such  variability;  however,  this  is  not  likely  to 
be  the  case  since  large  outliers  are  filtered  by  the  RANSAC  distance  threshold.  Another  possible 
explanation  is  that  a  larger  number  of  small  outliers  (those  belonging  to  a  parallel  but  offset 
plane)  contribute  to  the  uncertainty.  This  is  consistent  with  the  scene,  shown  in  figure  6,  where 
drawers  or  walls  separated  by  a  door  may  or  may  not  be  classified  as  belonging  to  a  common 
plane. 
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Figure  7.  Results  from  a  complex  dynamic  scene  as  well  as  planar  tracking  plots. 

Note:  Top  left:  results  for  a  frame  of  the  scene.  Top  right:  results  for  a  subsequent  frame.  The  side  wall  is  incorrectly  added  as  a  new 
plane  (from  blue  to  yellow),  while  the  far  wall  is  tracked  properly.  Bottom  left:  range  image  of  the  scene.  Bottom  right:  plots  of 
the  number  of  current  planes  in  a  scene  as  well  as  the  number  of  new  planes  in  the  scene.  Large  discontinuities  in  the  plots  are  due 
to  high  rates  of  rotation  and  translation. 


Comparisons  of  descriptive  statistics  collected  on  plane  parameters  (centroid  and  plane  normal) 
showed  no  apparent  pattern  of  difference  between  results  from  repeated  treatments  of  a  single 
frame  and  results  from  treatments  of  multiple  frames  from  a  single  static  scene  (sections  4.2  and 
4.3).  It  is  not  clear  that  the  measures  are  adequate  to  differentiate  between  noise  from  RANSAC 
and  noise  from  the  sensor  itself.  It  may  be  desirable  in  subsequent  work  to  lump  these  two 
sources  of  variability,  as  the  results  of  our  plane-finding  approach  appear  “good  enough”  for  the 
intended  use. 
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5.  Conclusions 


A  method  for  finding  predominant  planes  in  a  scene  using  a  3-D  TOP  camera  has  been 
presented.  We  have  demonstrated  the  ability  to  find  and  track  planes  in  a  predominantly  planar 
scene,  with  potential  for  real-time  robotic  applications  (depending  on  the  parameter 
configuration,  the  entire  process  runs  at  2-10  Hz).  Much  of  the  code  is  not  optimized,  and  there 
are  currently  costly  routines  that  exist  for  visualization  purposes  only. 

The  effort  to  date  has  largely  constituted  familiarization  with  the  literature,  the  sensor,  and  the 
toolkit  selected.  The  conclusion  is  that  the  ROS  framework  provides  useful  tools  for  extracting 
planes,  and  that  plane  extraction  using  SwissRanger  data  is  sufficient  to  begin  investigating 
methods  to  improve  performance  in  dynamic  scenes. 

The  current  state  of  the  research  presents  various  options  for  future  work,  which  require  some 
prioritization.  Using  the  estimates  in  tracking  will  require  a  robust  matching  algorithm  to  deal 
with  the  variability  in  the  plane  estimates. 

Also,  to  properly  test  the  ability  of  a  robot  to  track  planes  in  a  dynamic  environment,  localization 
measurements  are  needed  to  estimate  egomotion.  Motion  capture  systems  can  be  used  for 
precise  motion  measurement,  or  onboard  sensors  could  independently  provide  validation  using 
localization  techniques. 

Alternatively,  localization  techniques  usually  applied  to  point  or  pixel  features  in  a  scene  might 
be  extended  to  the  case  of  a  small  number  of  plane  features.  This  would  reduce  the 
computational  complexity  of  an  ICP-based  approach,  and  the  planar  features  should  be  less 
prone  to  false  registration  than  a  large  point  cloud.  If  localization  is  achievable,  then  a  plane- 
based  SLAM  technique  could  be  developed. 

As  mentioned  earlier,  automatic  seed  selection  for  region  growing  techniques  could  be 
investigated  for  the  purpose  of  improving  plane-finding  performance  compared  with  local 
normal-based  RANSAC.  Possible  approaches  could  include  gradient-based  processing  on  either 
the  point  cloud  data  or  the  associated  range/intensity  image  to  identify  relatively  nonunique 
points,  which  make  good  seed  candidates. 

Furthermore,  although  we  are  currently  concerned  with  extracting  planes  from  a  data  set,  similar 
methods  are  shown  by  Rusu  (2009)  to  successfully  extract  other  geometric  primitives.  These 
geometric  primitives  are  often  used  to  build  semantic  maps  of  an  environment,  which  use 
geometric  primitives  to  abstract  higher-level  object  definitions,  such  as  tables,  doors,  or  ceilings 
for  the  planar  primitive. 
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Finally,  experiments  should  be  run  that  quantitatively  identify  good  parameters  to  use  to 
optimize  plane  extraction  and  matching.  This  will  likely  require  a  study  of  RANSAC 
parameters.  Quality  will  be  assessed  in  terms  of  (at  least)  extraction  accuracy  and  process 
runtime. 
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